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Abstract 

The  study  of  visual  navigation  problems  requires  the  integration  of  visual  processes 
with  motor  control  processes.  Most  essential  in  approaching  this  integration  is  the  study  of 
appropriate  spatio-temporal  representations  which  the  system  computes  from  the  imagery 
and  which  serve  as  interfaces  to  all  cognitive  and  motor  activities.  Since  representations 
resulting  from  exact  quantitative  reconstruction  have  turned  out  to  be  very  hard  to  obtain, 
we  argue  here  for  the  necessity  of  of  representations  which  can  be  computed  easily,  reliably 
and  in  real  time  and  which  recover  only  the  information  about  the  3D  world  which  is  really 
needed  in  order  to  solve  the  navigational  problems  at  hand.  In  this  paper  we  introduce  a 
number  of  such  representations  capturing  aspects  of  3D  motion  and  scene  structure  which 
are  used  for  the  solution  of  navigational  problems  implemented  in  visual  servo  systems. 
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1  Introduction 


Sensory  motor  activity,  as  it  appears  in  nature,  is  largely  controlled  as  a  servomechanism. 
For  instance,  a  person  who  is  just  learning  to  drive  generally  keeps  the  car  on  the  road  by 
fixing  his  attention  on  the  edge  of  the  road  and  comparing  the  location  of  this  edge  with 
some  object  on  the  car,  such  as  the  hood  cap.  If  this  distance  is  too  small,  the  learner  reacts 
by  turning  the  steering  wheel  to  the  left;1  if  it  gets  too  large,  he  reacts  by  turning  the  steering 
wheel  to  the  right.  It  is  characteristic  of  the  learner  that  his  driving  consists  of  a  series  of 
oscillations  about  the  desired  position.  As  the  driver  improves,  he  introduces  anticipation, 
or  derivative  control.  In  this  condition  a  driver  takes  into  consideration  the  rate  at  which 
he  is  approaching  the  correct  distance  from  the  edge  of  the  road;  eventually  his  control 
on  the  steering  wheel  becomes  more  sophisticated,  possibly  turning  into  a  combination  of 
proportional,  derivative  and  integral  control. 

This  and  other  examples  from  nature  demonstrate  that  a  continuous  interplay  between 
visual  processing  and  motor  activity  is  a  characteristic  of  most  existing  systems  that  interact 
with  their  environments.  In  the  psychological  literature  systematic  investigations  have  been 
conducted  on  the  role  of  vision  in  human  motor  control  [41].  It  has  been  concluded  that  for 
human  motor  coordination  tasks,  such  as  hand-eye  coordination,  there  are  two  phases  to  a 
movement:  an  impulse  phase  that  performs  the  initial  motion  (for  instance,  that  moves  the 
arm  most  of  the  way  to  the  target)  and  a  controlled  phase  that  provides  final  adjustments. 
In  the  absence  of  vision,  the  accuracy  of  the  movements  is  greatly  reduced. 

Similar  phenomena,  of  course,  arise  in  robotics.  Most  robotics  systems  of  the  past  did 
not  use  perception,  but  were  designed  to  perform  a  pre-specified  set  of  motions.  As  the  fields 
of  Robotics  and  Computational  Vision  mature  and  increasingly  complicated  autonomous 
systems  are  studied,  they  have  to  deal  with  both  perception  and  action.  Computational 
theories  are  needed  that  explain  perceptual  capabilities,  motor  control  procedures,  as  well 
as  theories  underlying  the  integration  of  vision  and  motor  control  into  a  working  system. 

Initial  attempts  at  these  difficult  problems  followed  a  modular  approach.  The  goal  of 
Computational  Vision  was  defined  as  the  reconstruction  of  an  accurate  description  of  the 
system’s  spatiotemporal  environment.  This  description  amounts  to  knowledge  of  the  robot’s 
3D  motion  relative  to  any  point  in  its  environment  and  the  depths  or  shapes  of  the  surfaces 
in  view.  Assuming  that  this  information  can  be  acquired  exactly,  sensory  feedback  robotics 
was  concerned  with  the  planning  and  execution  of  the  robot’s  activities.  The  problem  with 
such  separation  of  perception  from  action  was  that  both  computational  goals  turned  out  to 
be  intractable.  Complete  visual  reconstruction  is  generally  an  ill-posed  problem,  and  can 
only  lead  to  satisfying  results  if  exact  models  of  the  robot’s  environment  are  available.  On 
the  other  hand,  spatial  planning  and  motion  control  in  3D  space  are  very  sensitive  to  errors 
in  the  description  of  the  spatiotemporal  environment.  Furthermore,  this  approach  did  not 
consider  any  complexity  issues.  Visual  reconstruction  is  very  time-consuming,  the  usual 
control  problems  require  very  expensive  matrix  computations,  and  many  spatial  planning 
problems  are  intractable.  Real  systems  are  bounded  in  their  computational  capacity  and 
have  to  operate  in  real  time. 

It  has  been  suggested  in  different  fields — in  Computer  Vision,  Artificial  Intelligence,  as 
Tf  the  car  has  a  left-hand  drive. 
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well  as  Robotics  Engineering — that  the  major  problem  with  this  modular  approach  is  that 
it  does  not  enforce  a  tight  coupling  of  perception  and  action.  In  Computer  Vision  the  major 
approach  of  visual  reconstruction,  which  aims  for  complete  general  representations,  has  been 
ggygj-gjy  criticized.  It  has  been  argued  that  Computational  Vision,  in  order  to  be  successful, 
has  to  study  vision  for  systems  which  are  active  and  purposive  [1,  2,  5,  6].  Recently  a  number 
of  studies  have  been  published  which  argue  for  a  closer  coupling  by  means  of  achieving 
solutions  to  a  number  of  specialized  visuo-motor  control  problems,  usually  problems  for 
which  a  model  of  the  environment  is  available  [30,  31].  The  field  of  AI  brought  forward  the 
approaches  of  Connectionism  and  Behavior-Based  Robotics  [8],  which  propose  to  address 
the  study  of  intelligent  behavior  through  the  construction  of  working  mechanisms.  In  the 
classical  engineering  literature,  we  encounter  the  so-called  approach  of  image-based  control 
[4,  13,  34,  40].  The  principle  behind  this  approach  is  that  instead  of  computing  intermediate 
representations,  directly  available  image  measurements  are  used  as  feedback  for  the  control 
loop.  When  referring  to  image  measurements,  image  features  are  meant  that  supposedly 
can  be  easily  extracted  from  the  imagery,  such  as  areas  of  faces,  lengths  of  edges,  angles, 
slopes  of  lines,  or  centroids  of  faces,  if  well-defined  geometrical  objects  are  considered.  Most 
commonly,  a  number  of  feature  points  extracted  from  the  image  are  tracked  over  time.  The 
idea  behind  this  is  that  the  chosen  image  features  can  be  directly  related  to  the  parameters 
of  the  robot’s  joints  and  thus  a  map,  the  so-called  Perceptual  Kinematic  Map  [20],  is  created 
which  relates  the  joint  space  directly  to  the  image  space.  However,  image  features  of  this 
kind,  in  general,  are  not  easily  extractable.  The  tracking  of  a  number  of  points  over  a  number 
of  frames  constitutes  an  ill-posed  problem.  Furthermore,  even  simple  kinematic  maps  are  no 
longer  simple  when  it  comes  to  inverting  them.  Thus  the  problems  that  could  be  managed 
in  this  way  required  very  simple  configurations,  limiting  the  robot  s  degrees  of  mobility  and 
considered  only  scenes  for  which  geometrical  models  were  available. 

Here  we  argue  that  from  the  viewpoint  of  computational  perception  the  essence  of 
understanding  the  coupling  of  perception  and  action  will  come  from  understanding  the  ap¬ 
propriate  spatiotemporal  representations  which  the  system  computes  from  the  imagery  and 
which  serve  as  interfaces  to  all  cognitive  and  motor  activities.  These  representations  have  to 
be  easily  obtainable,  and  they  have  to  be  computable  reliably  and  in  real  time.  The  contri¬ 
bution  of  this  paper  lies  in  introducing  such  representations  for  3D  motion  and  shape.  By 
considering  three  tasks  requiring  visual  information  of  increasing  complexity  we  introduce 
representations  which  capture  a  system’s  own  motion,  the  motions  of  objects  in  the  scene, 
and  the  shape  of  the  scene.  We  also  show  how  these  representations  can  be  integrated  into 
visual  servo  systems  based  on  dynamic  feedback  control. 

The  organization  of  this  paper  is  as  follows.  Section  2  describes  the  visual  reconstruction 
paradigm.  It  discusses  the  computations  usually  employed  to  derive  from  a  sequence  of 
images  estimates  of  3D  motion  and  the  shape  of  the  scene  in  view.  It  then  outlines  the 
approach  taken  in  this  paper  and  briefly  describes  the  three  tasks  addressed.  The  following 
sections  are  devoted  to  single  tasks;  first  the  task  is  described,  second  the  representations 
computed  from  the  visual  information  are  given,  and  third  the  feedback  control  based  on 
these  representations  is  discussed.  In  particular  Section  3  starts  with  a  description  of  the 
robotic  system  used  in  all  three  tasks,  followed  by  a  description  of  the  task  of  moving  toward 
a  fixed  direction.  In  the  subsections  on  the  visual  representations  the  global  constraints 
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relating  2D  flow  field  information  to  3D  egomotion  information  are  described  generally  before 
it  is  explained  how  these  constraints  can  be  used  in  the  servomechanism  of  a  control  system. 
Then  the  control  is  discussed.  Section  4  outlines  the  second  task,  pursuing  a  moving  target, 
which  utilizes  as  visual  representation  the  time  to  contact,  a  function  of  the  speed  of  the 
robot  and  the  relative  depths  of  objects  in  the  scene.  Similarly,  Section  5  is  devoted  to  the 
third  task,  perimeter  following,  which  uses  as  visual  representation  a  function  of  the  shape  of 
the  scene.  Next,  experiments  related  to  all  three  tasks  are  described,  and  finally  a  summary 
of  the  work  is  given. 

2  3D  Motion  and  Shape  Estimation 

Problems  of  perception  that  require  some  3D  motion  and  shape  information  have  usually  been 
addressed  in  the  context  of  general  visual  recovery.  They  were  approached  by  first  segmenting 
the  image  into  areas  of  the  same  relative  motion  and  then  reconstructing  the  motion  and 
shape  of  every  point  in  the  scene.  The  computational  theory  behind  this  approach,  known 
as  “structure  from  motion”,  suggests  solving  the  problem  in  two  stages.  First,  accurate 
image  displacements  between  consecutive  frames  have  to  be  computed,  either  in  the  form 
of  point  correspondences  [15,  38]  or  as  dense  motion  fields  (optical  flow  fields)  [3,  21,  23]. 
In  a  second  step,  the  3D  motion  and  structure  are  derived  from  constraints  due  to  the 
geometric  transformation  between  the  views  relating  the  local  2D  image  measurements  to 
3D  parameters  [9,  22,  24,  26,  27,  35,  37].  Then,  in  order  to  solve  the  particular  problems  at 
hand,  subsets  of  the  computed  structure  and  motion  parameters  have  been  utilized  as  inputs 
to  non-visual  cognitive  processes,  such  as  planning  or  control. 

Despite  the  clear  methodology  and  formalism  of  this  computational  theory,  it  turned 
out  that  in  both  computational  steps  extreme  difficulties  are  inherently  involved.  The  com¬ 
putation  of  optical  flow  and  correspondence  in  the  general  case  is  an  ill-posed  problem  and 
additional  assumptions  must  be  made  in  order  to  solve  it.  Recovering  3D  motion  from  in¬ 
exact  or  noisy  flow  fields  has  turned  out  to  be  a  problem  of  extreme  sensitivity.  As  a  result, 
navigational  problems  have  basically  been  treated  as  numerical  analysis  problems  where  so¬ 
phisticated  techniques  (such  as  singular  value  decomposition,  simulated  annealing,  Kalman 
filtering,  maximum  likelihood  estimation,  etc.)  have  been  employed  in  order  to  estimate  the 
3D  motion  from  the  geometric  and  photometric  constraints  which  relate  local  image  motion 
to  the  3D  world. 

To  be  successful  in  solving  real-time  navigational  tasks  we  need  to  find  a  way  of  address¬ 
ing  the  motion  problem  that  uses  easily  obtainable  information  and  ideally  should  use  that 
information  only  to  obtain  the  aspects  of  3D  motion  and  structure  that  we  really  need.  For 
example,  we  may  not  need  to  know  the  exact  motion  parameters,  but  only  approximations 
to  the  translation  and  the  rotation,  or  whether  there  is  significant  motion  at  all,  or  whether 
there  is  significant  rotation,  or  what  the  time  to  collision  is,  etc.  Or  we  may  be  interested  in 
using  motion  as  a  depth  clue  or  shape  clue;  in  that  case,  it  is  only  the  depth  that  matters, 
or  only  the  derivatives  of  depth.  Usually,  qualitative  information  about  depth  is  all  we  care 
about. 

Since  our  goal  is  to  study  perception  from  a  computational  point  of  view,  it  is  compu¬ 
tational  considerations  that  guide  us  in  the  study  and  development  of  visual  representations. 
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Thus  we  classify  visual  representations  according  to  the  complexity  of  the  computational 
models  employed  to  relate  image  measurements  to  spatiotemporal  representations  and  ac¬ 
cording  to  the  complexity  of  the  visual  input  used. 

Obviously,  not  every  motion  is  of  the  same  complexity.  Rigid  motion  is  less  complex 
than  affine  motion,  and  affine  motion  is  less  complex  than  a  motion  which  can  be  modeled 
by  a  projective  transformation,  or  than  any  other  non-rigid  motion. 

It  is  less  obvious,  however,  that  not  all  rigid  motion  requires  the  same  computational 
model  and  the  same  input.  For  example,  whereas  information  about  the  egomotion  of  a 
system  can  be  computed  from  global  image  information  over  the  entire  image  plane,  and  thus 
image  information  can  be  employed  in  a  redundant  way,  information  about  the  motions  of 
small  objects  in  the  scene  can  only  be  derived  from  a  small  field  of  view,  and  thus  quantitative 
local  measurements  have  to  be  used. 

In  this  paper,  by  considering  three  visuo-servo  motor  control  tasks,  we  present  repre¬ 
sentations  for  3D  rigid  motion  and  shape  of  increasing  levels  of  complexity.  In  all  three  tasks 
we  consider  a  robot  system  consisting  of  a  body  and  a  camera  which  can  move  indepen¬ 
dently  of  the  body.  The  first  and  simplest  task  consists  of  changing  the  robot’s  direction  of 
motion  towards  a  fixed  direction  using  visual  information.  For  this  task  we  need  only  partial 
egomotion  information  about  the  robot.  This  can  be  derived  using  the  global  constraints 
introduced  in  [18]  and  [19],  which  relate  the  sign  of  local  image-flow  information  to  the  direc¬ 
tion  of  translation  and  rotation.  In  the  second  task  the  robot  must  pursue  a  moving  target 
while  keeping  a  certain  distance  from  the  target.  In  addition  to  computing  its  own  motion 
the  system  also  has  to  derive  some  information  about  its  speed  relative  to  the  target.  The 
additional  visual  representation  computed  is  the  time  to  collision.  In  the  third  and  last  task, 
the  robot  has  to  follow  a  perimeter.  This  task  requires  the  system  to  compute  some  form  of 
depth  information  about  the  perimeter.  The  depth  representation  employed  is  less  complex 
than  the  classical  ones  of  (scaled)  distance  that  are  usually  used.  It  is  a  function  of  scaled 
shape  which  can  be  derived  without  first  computing  3D  motion. 

3  Task  One:  Moving  Towards  a  Fixed  Direction 

The  robotic  system  considered  in  all  three  tasks  consists,  as  illustrated  in  Figure  1,  of  a 
body  on  wheels  with  a  camera  on  top  of  the  body.  To  describe  the  system  s  degrees  of 
freedom  we  define  two  coordinate  systems,  attached  to  the  camera  and  the  robot.  The  robot 
moves  on  a  surface  and  is  constrained  in  its  movement  to  a  forward  translation  TR  (along 
zR )  and  a  rotation  ujr  around  the  vertical  axis,  i.e.,  the  yR- axis.  The  camera  is  positioned 
along  a  vertical  axis  that  passes  through  the  center  of  rotation  of  the  robot  and  it  has  two 
independent  rotational  degrees  of  freedom.  Its  orientation  measured  with  respect  to  the 
coordinate  frame  of  the  robot  is  given  by  its  tilt,  dx->  and  its  pan,  0y'.  there  is  no  roll,  i.e. 

ez  =  o. 

We  first  consider  the  simple  task  of  moving  towards  a  new  direction.  Referring  to  the 
mobile  robot  illustrated  in  Figure  2,  the  problem  can  be  stated  as  follows:  a  robot,  moving 
forward  with  speed  S ,  is  required  to  head  towards  a  new  direction,  along  which  some  feature 
p  lies;  p  is  selected  beforehand  by  some  higher  level  process.  The  robot  first  directs  the 
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Figure  1:  Camera  and  robot  local  coordinate  systems. 

camera  at  p,  so  that  the  line  of  sight  is  now  positioned  along  p.  We  assume  such  gaze  shifts 
are  accomplished  by  fast  saccadic  movements. 

p 


Figure  2:  The  robot,  currently  moving  forward  with  speed  S,  aims  to  veer  towards  p.  The 
dotted  path  represents  the  trajectory  generated  in-flight  by  the  servo  system. 

The  robot  must  now  make  a  series  of  steering  decisions  so  that  eventually  its  heading 
direction  is  aligned  with  where  the  camera  is  pointing.  To  be  more  accurate,  since  the  robot 
moves  on  a  surface  and  the  feature  in  view  can  be  at  any  height,  we  only  want  the  direction  of 
the  forward  motion  and  the  direction  of  the  heading  to  have  the  same  projection  on  the  xz- 
plane  of  the  camera  coordinate  system,  i.e.,  9y  has  to  become  zero.  These  steering  movements 
are  controlled  by  a  servomechanism,  which  derives  information  from  images  captured  by  the 
camera. 

The  next  section  will  be  devoted  to  a  discussion  of  the  visual  features,  or  more  exactly, 
the  visual  patterns,  employed  by  the  servomechanism  and  the  manner  in  which  they  are 
made  use  of.  It  will  be  followed  by  an  analysis  of  the  servomechanism  itself. 
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3.1  Global  motion  patterns 


The  visual  input  that  has  been  used  to  describe  the  computational  analysis  of  visual  motion 
is  the  optical  flow  field,  which  stems  from  the  movement  of  light  patterns  in  the  image  plane. 
Optical  flow  fields  constitute  a  good  approximation  to  the  projection  of  the  real  3D  motion 
at  scene  points  on  the  image  [33,  39].  In  general,  however,  accurate  values  of  optical  flow 
are  not  computable.  On  the  basis  of  local  information  only  the  component  of  optical  flow 
perpendicular  to  edges,  the  so-called  normal  flow,  is  well  defined  (the  aperture  problem, 
see  Figure  3).  In  many  cases  it  is  possible  to  obtain  additional  flow  information  for  areas 
(patches)  in  the  image.  Thus,  the  input  that  can  be  used  by  perceptual  systems  for  further 
motion  processing  is  some  partial  optical  flow  information. 


(a)  (b) 


Figure  3:  (a):  Line  feature  observed  through  a  small  aperture  at  time  t.  (b):  At  time  t  +  6t 
the  feature  has  moved  to  a  new  position.  It  is  not  possible  to  determine  exactly  where  each 
point  has  moved  to.  From  local  measurements  only  the  flow  component  perpendicular  to 
the  line  feature  can  be  computed. 

The  constraints  that  have  been  used  in  earlier  work  are  mostly  local  ones.  However,  the 
utilization  of  image  measurements  from  only  small  image  regions  is  extremely  error-prone: 
On  the  one  hand,  image  measurements  are  very  hard  to  compute  accurately.  Even  if  we  just 
compute  the  normal  flow,  the  projection  of  the  retinal  motion  on  the  local  image  gradients, 
we  need  to  use  infinitesimal  computations  and  have  to  approximate  derivatives  by  difference 
quotients,  and  thus  our  computations  can  only  be  approximations.  Much  more  difficult, 
however,  is  the  computation  of  optical  flow  or  disparity  measurements,  which  requires  us  to 
employ  some  additional  assumptions,  usually  smoothness  assumptions,  and  thus  we  run  into 
problems  at  motion  and  depth  boundaries.  On  the  other  hand,  even  if  we  had  reasonably 
accurate  flow,  it  would  not  be  the  case  that  a  small  local  change  in  flow  implies  a  small  change 
in  three-dimensional  motion.  Completely  different  camera  geometries  produce  locally  similar 
disparity  fields.  For  example,  in  an  area  near  the  vertical  axis  in  the  image  plane,  3D  rotation 
around  the  horizontal  axis  in  the  3D  world  produces  a  flow  field  similar  to  the  one  produced 
by  translation  along  the  vertical  axis  in  the  3D  world. 

Also,  if  we  seek  to  provide  solutions  for  real-time  systems,  we  have  to  consider  time  con¬ 
straints  imposed  on  the  actions  the  system  performs.  Computations  that  could  not  possibly 
be  performed  fast,  such  as  linear,  non-parallelizable  algorithms  that  require  a  large  number  of 
steps,  or  optimization  techniques  involving  a  large  number  of  iterations,  are  of  little  interest. 
In  this  category  falls  the  computation  of  exact  image  displacements.  Both  the  estimation  of 
discrete  disparities  and  the  computation  of  optical  flow  require  optimization  techniques  to 
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be  invoked;  and  the  more  effort  we  put  into  deriving  accurate  measurements,  by  considering 
realistic  situations  dealing  with  motion  boundaries  and  thus  modeling  discontinuities,  the 
more  computational  steps  we  have  to  perform. 

In  [16,  19]  it  has  been  shown  that  3D  motion  information  can  also  be  derived  from  a 
global  structure  in  the  motion  field  due  to  local  image  information  much  simpler  than  optical 
flow.  New  constraints  defined  on  vectors  in  selected  directions  were  found  which  manifest 
themselves  as  patterns  in  the  image  plane.  The  patterns  are  regions  containing  vectors  of 
either  positive  or  negative  values,  which  are  separated  by  conic  sections  and  straight  lines. 
In  the  following  sections  we  will  make  use  of  some  of  these  patterns,  namely  the  “copoint 
patterns”  and  the  “coaxis  patterns”.  A  short  description  of  the  corresponding  constraints 
along  with  the  basic  equations  relating  2D  image  motion  to  3D  motion  and  scene  parameters 
is  given  below. 

The  2D  motion  field  on  an  imaging  surface  is  the  projection  of  the  3D  motion  field  of 
the  scene  points  moving  relative  to  that  surface.  If  this  motion  is  rigid,  it  is  composed  of 
a  translation  t  =  ( U ,  V,  W)  and  a  rotation  w  =  (a,  ^,7).  We  consider  the  case  of  a  moving 
camera  in  a  stationary  environment  with  the  coordinate  system  fixed  to  the  nodal  point  of 
the  camera  and  the  image  projection  on  a  plane  perpendicular  to  the  Z-axis  at  distance 
/  (focal  length)  from  the  center.  Introducing  the  coordinates  (x0,yo)  =  f°r  the 

direction  of  translation,  the  so-called  FOE,  we  obtain  the  following  well-known  equations 
relating  the  velocity  u  =  (u,v)  of  an  image  point  to  the  3D  velocity  and  the  depth  Z  of  the 
corresponding  scene  point  [27]: 


^trans  "1“  ^rot 

(-xo  +  x)-^-  +  a-jr  ~  +  /)  +  72/ 

(1) 

^trans  ^rot 

(-2/0  +  2/)y  +  a(y  +  /)  -  P-j-  ~  /yx 

(2) 

We  consider  the  flow  along  certain  directions,  i.e.  the  the  projection  un  of  the  flow  u 
on  direction  n  =  (nx,  ny)  (unit  vector),  which  is  given  as 

un  =  (u  •  n)  •  n. 


Thus  we  obtain  un  for  the  value  of  the  vector  un  along  the  direction  n: 

Un  =  y({x  -  Xo )nx  +  (y  -  y0)ny) 

{orf  -  p(j  +f)+  7 y)n*  -  M2/  +  /)  -  P*f  -  lx)nV 

By  choosing  particular  directions,  we  define  classes  of  vectors.  In  particular  we  consider 
here  motion  vectors  in  the  direction  of  two  classes  of  vectors,  the  “coaxis  vectors”  and  the 
“copoint  vectors”. 

The  coaxis  vectors  defined  with  respect  to  a  direction  in  space  are  described  as  follows: 
A  line  through  the  image  formation  center  defined  by  the  direction  cosines  (A,  B,  C)  defines 
a  family  of  cones  with  axis  (A,  B,  C)  and  apex  at  the  origin.  The  intersections  of  the  cones 


7 


with  the  image  plane  give  rise  to  a  set  of  conic  sections,  called  field  lines  of  the  axis  ( A ,  B ,  C), 
and  the  vectors  perpendicular  to  the  conic  sections  are  are  called  the  ( A ,  B,  C )  coaxis  vectors. 
Their  direction  is  parallel  to  the  vector  ( mx,my ),  which  is  defined  as 

(m*,my)  =  ((-A(y2  +  f2)  +  Bxy +  Cxf),  .  , 

(. Axy-B(x2  +  f2)  +  Cyf )).  1  J 

In  order  to  establish  conventions  about  the  vector’s  orientation,  a  vector  will  be  said  to  be 
of  positive  orientation  if  it  is  pointing  in  direction  ( mx,my ).  Otherwise,  if  it  is  pointing  in 
direction  (-m,,  -my),  its  orientation  will  be  said  to  be  negative  (see  Figure  4). 


Figure  4:  Field  lines  corresponding  to  an  axis  ( A ,  B,  C)  and  positive  coaxis  vectors  (A,  B ,  C). 

If  we  consider  for  a  class  of  (A,  B ,  C )  coaxis  vectors  the  orientation  of  the  transla¬ 
tional  components,  we  find  that  a  second  order  curve  h(A,  B,  C,  x0,  yo]  x,  y)  =  0  (Figure  5a) 
separates  the  positive  from  the  negative  components,  where 

h(A,B,C,x0,y0;x,y)  =  (x  -  x0,y  -  y0)  ■  (nx,ny) 

=  x2(Cf  +  By0 )  +  y2(Cf  +  Ax0)  -  xy(Ay0  +  Bx0 ) 

—xf(Af  +  Cx  o)  —  yf(Bf  +  Cy0)  +  f2(Ax0  +  Byo )  (5) 

Curve  h  —  0  passes  through  the  FOE  and  is  uniquely  defined  by  the  FOE’s  two  image 
coordinates  (x0,y0).  Similarly,  the  positive  and  negative  components  of  the  (A,  B,  C)  coaxis 
vectors  due  to  rotation  are  separated  by  a  straight  line  g(A,  B,  C.  a,  /?,  7;  x,  y)  =  0,  where 

g(A,B,C,OL,P,y,x,y)  =  y(aC  -  7 A)  -  x(/3C  -  7 B)  +  (3Af  -  aBf.  (6) 

The  line  passes  through  the  point  where  the  rotation  axis  pierces  the  image  plane  (Figure  5b). 
The  point,  whose  coordinates  are  (^,  is  called  the  Axis  of  Rotation  point  (AOR).  Com¬ 
bining  the  constraints  due  to  translation  and  rotation,  we  obtain  the  following  geometrical 
result:  A  second  order  curve  separating  the  plane  into  positive  and  negative  values  and  a  line 
separating  the  plane  into  two  half-planes  of  opposite  sign  intersect.  This  splits  the  plane  into 
areas  of  only  positive  coaxis  vectors,  areas  of  only  negative  coaxis  vectors,  and  areas  in  which 
the  rotational  and  translational  flow  have  opposite  signs.  In  these  last  areas,  no  information 
is  derivable  without  making  depth  assumptions  (Figure  5c).  The  structure  defined  on  the 
coaxis  vectors  is  called  the  coaxis  pattern. 

For  a  second  kind  of  classification,  the  copoint  vectors,  which  are  defined  with  respect 
to  a  point,  similar  patterns  are  obtained.  The  (r,s)  copoint  vectors  are  the  normal  motion 
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Figure  5:  (a)  The  (A,  B ,  C)  coaxis  vectors  due  to  translation  are  negative  if  they  lie  within 
a  second-order  curve  defined  by  the  FOE,  and  are  positive  at  all  other  locations,  (b)  The 
coaxis  vectors  due  to  rotation  separate  the  image  plane  into  a  half-plane  of  positive  values 
and  a  half-plane  of  negative  values,  (c)  A  general  rigid  motion  defines  an  area  of  positive 
coaxis  vectors  and  an  area  of  negative  coaxis  vectors.  The  rest  of  the  image  plane  is  not 
considered. 


vectors  which  are  perpendicular  to  straight  lines  passing  through  the  point  (r,  s)  (see  Fig¬ 
ure  6).  At  point  (x,  y)  an  (r,  s)  copoint  vector  (ox,  oy)  of  unit  length  in  the  positive  direction 
is  defined  as 

(Oj;,  Oy)  — 


(-y  +  stx-r) 
\J{x  -  r)2  +  (y  -  s)2 


For  the  copoint  vectors,  the  rotational  components  are  separated  by  a  second  order  curve  into 
positive  and  negative  values  and  the  translational  components  are  separated  by  a  straight 
line.  The  structure  defined  on  the  copoint  vectors  is  called  the  copoint  pattern. 


Figure  6:  Positive  copoint  vectors  (r,s). 

Of  particular  interest  for  this  application  are  the  (r,  s)  copoint  vectors  for  which  the 
copoint  (r,  3)  lies  in  infinity.  The  corresponding  copoint  vectors  are  all  parallel  to  each  other 
with  gradient  jj*-  =  —  *  (see  Figure  7a, b).  For  these  cases  the  line  separating  the  translational 
components  is  perpendicular  to  the  gradient  vector  ( nx,ny )  and  has  the  following  form: 

k(nx,  riyi  x0,  y0]  x,  y)  =  (x  -  x0 )n*  +  (y  -  yo)ny  (8) 
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The  curve  separating  the  rotational  components  is  a  hyperbola  given  by 

2  2  2 

l(nx, »„,  oc,  ft,  7;  x,  y)  =  ( a y-  -  /?(y  +  /)  +  72/)n*  +  (<*(y  +  /)  “  0(~jr  +  /)  “  (9) 


(a)  (b) 

Figure  7:  (a)  Parallel  copoint  vectors,  (b)  Corresponding  pattern. 

The  positions  of  the  coaxis  and  copoint  vectors  in  the  image  plane  encode  the  param¬ 
eters  describing  the  axis  of  translation  and  the  direction  of  the  rotation  axis.  Thus,  these 
constraints  lead  to  formulating  the  problem  of  3D  motion  estimation  as  a  pattern  recogni¬ 
tion  problem.  If  the  system  has  the  capability  of  estimating  the  the  sign  of  the  flow  along 
the  directions  defined  by  various  families  of  coaxis  (or  copoint)  vectors,  then  by  localizing 
a  number  of  patterns,  or  more  precisely,  the  boundaries  of  the  regions  separating  positive 
and  negative  vectors,  the  system  can  find  the  axes  of  translation  and  rotation.  The  inter¬ 
section  of  the  coaxes’  second  order  curves  and  the  copoints’  lines  provides  the  FOE  and  the 
intersection  of  the  coaxes’  lines  and  the  copoints’  curves  provides  the  AOR. 

How  much  information  will  be  available  for  pattern  fitting,  and  thus  how  accurately  the 
FOE  and  AOR  can  be  localized,  depends  on  the  computability  of  flow  information.  If  the 
system  is  able  to  derive  optical  flow  then  it  is  able  to  estimate  the  sign  of  the  projection  of 
flow  along  any  direction,  and  thus  for  every  pattern  at  every  point  information  is  available. 
If,  however,  the  system  is  less  powerful  and  can  only  compute  the  sign  of  the  flow  in  one  or 
a  few  directions,  then  patterns  are  matched  as  before.  The  difference  is  that  information  is 
not  available  for  every  point  and  consequently  the  uncertainty  in  pattern  matching  may  be 
larger,  and  the  FOE  and  AOR  can  only  be  located  within  bounds.  In  the  simplest  case,  which 
requires  the  least  amount  of  computational  effort,  the  flow  in  only  one  direction,  namely  the 
one  perpendicular  to  the  local  edge,  is  computed  (the  normal  flow).  But  even  this  minimal 
amount  of  information  can  lead  to  rather  small  uncertainty  in  the  motion  estimation. 

If  the  rigid  motion  estimation  problem  is  considered  for  a  passive  system,  a  search  in  the 
appropriate  parameter  spaces  has  to  be  performed.  Every  single  pattern  is  defined  by  three 
unknown  parameters  (a  second  order  curve  of  two  unknowns  and  a  line  of  one  unknown). 
A  general  rigid  motion  for  which  no  information  is  available  that  could  reduce  the  space  of 
possible  solutions  thus  requires  a  search  in  various  3D  subspaces.  If,  however,  the  system 
is  active,  and  if  it  has  the  capability  to  control  its  motor  apparatus,  the  above  described 
constraints  may  be  utilized  in  much  more  efficient  ways. 
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3.2  Using  visual  patterns  in  the  servomechanism  of  a  moving  system 

The  retinal  motion  field  perceived  on  the  robot’s  camera  is  due  to  translation  and  rotation. 
The  direction  of  translational  motion  is  defined  by  the  angle  between  the  direction  in  which 
the  robot  is  moving  and  the  direction  in  which  the  camera  is  pointing.  The  rotation  origi¬ 
nates  from  body  motion  and  is  mainly  due  to  the  robot’s  turning  around  the  y-axis.  There 
could  also  be  some  rotation  around  the  z-axis  because  the  surface  on  which  the  robot  is 
moving  might  be  uneven,  but  there  will  be  no  or  only  very  small  rotation  around  the  z-axis 
(cyclotorsion). 

Recall  that  the  goal  of  the  visual  task  is  to  change  the  robot’s  motion  such  that  the 
direction  of  the  forward  motion  and  the  direction  of  the  heading  have  the  same  projection 
on  the  xz- plane  of  the  camera  coordinate  system.  Stated  in  terms  of  motion  parameters  this 
means  that  we  want  the  ^-coordinate  of  the  FOE  to  be  zero,  but  we  do  not  care  about  the 
y-coordinate. 

Let  us  now  investigate  the  patterns  of  of  positive  and  negative  flow  vectors  which 
correspond  to  such  motion.  We  first  consider  the  copoint  patterns  with  parallel  motion 
vectors.  If  the  FOE  is  on  the  y-axis,  i.e.  x0  =  0,  (8)  describing  the  translational  vectors 
becomes 


k(nx,ny,0,yO]x,y) :  xnx  +  (y  -  y0)ny  =0 

or  y  =  +  y0  ,  (10) 

which  constitutes  a  line  perpendicular  to  the  gradient  (nI5  ny)  with  intercept  yo-  In  particular 
for  the  horizontal  gradient  direction  {nx  =  1,  ny  =  0)  we  obtain  the  simplified  equation 

&(l,0,0,yo;z,y) :  x  =0 

During  the  process  of  steering  while  the  FOE  is  not  aligned  with  the  y-axis,  the  flow 
field  due  to  translation  is  separated  into  positive  and  negative  vectors  through 

x  nx  nx 

k{nx,ny,x0,yo',x,y) :  y  = -x - b  yo  +  x0 — 

ny  ny 

and  the  horizontal  flow  vectors  are  separated  by 

*(0,i,o,yo;*,sf):  y  =2/0 

The  rotation  around  the  y-axis  is  controlled  by  the  robot’s  steering  mechanism.  Therefore 
the  robot  has  knowledge  about  this  rotation  and  can  compensate  for  the  resulting  flow 
component  perceived  on  the  image  by  subtracting  it  from  the  visual  motion  field.  Of  course, 
we  cannot  assume  the  exact  amount  of  rotation  around  the  y-axis  to  be  known,  but  we  can 
assume  that  we  know  a  good  approximation  to  it.  This  additional  knowledge  makes  one  of 
the  patterns,  namely  the  copoint  pattern  with  gradient  (1,0),  particularly  suitable  for  fast 
estimation  of  the  parameters  to  be  controlled. 

Let  us  consider  the  rotational  components  of  the  copoint  pattern  with  gradient  (1,0): 
Within  the  visual  field  of  view  the  rotation  around  the  a-axis  gives  rise  to  flow  vectors  which 
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are  mostly  parallel  to  the  y- axis  and  thus  perpendicular  to  the  chosen  gradients,  As  a  result 
the  components  along  the  gradient  direction  (1,0)  are  close  to  zero.  The  remaining  (not 
derotated)  rotation  around  the  x-axis  is  nearly  parallel  to  the  gradients  and  thus  causes 
a  small,  nearly  constant  component  to  be  added  to  every  flow  vector.  In  summary,  the 
contribution  of  the  rotation  to  the  pattern  can  be  described  as  follows:  The  line  in  the 
translational  pattern  will  be  shifted  by  a  small  amount  in  the  direction  defined  by  the  sign 
of  the  (not  derotated)  rotation  around  the  x-axis. 

For  the  purpose  of  the  servoing  task  it  will  be  sufficient  to  approximate  the  copoint  pat¬ 
tern  with  gradient  (1, 0)  by  its  translational  flow  field  components.  Using  this  approximation 
gives  us  the  advantage  of  deriving  the  x-component  of  the  FOE  with  very  little  effort;  we  just 
fit  a  line  perpendicular  to  the  gradient  direction  separating  positive  from  negative  vectors. 
This  approximation  will  not  affect  the  successful  accomplishment  of  the  task.  As  the  robot 
approaches  its  goal,  the  steering  motion  it  has  to  apply  becomes  smaller  and  smaller  and 
thus  the  additional  rotational  flow  field  component  also  decreases,  which  in  turn  allows  the 
FOE  to  be  estimated  more  accurately. 

If  it  is  certain  that  the  rotation  around  the  x-axis  is  also  very  small,  then  any  other 
copoint  pattern  with  some  gradient  ( nx:ny )  could  be  used  in  addition  to  estimate  the  FOE’s 
coordinates  using  the  approximation  of  considering  the  pattern  to  be  translational. 

Instead  of  utilizing  copoint  vectors  we  could  equally  well  employ  a  class  of  coaxis  vectors, 
namely  those  which  correspond  to  axes  in  the  ATE-plane,  the  (A,  J9, 0)  coaxis  patterns.  For 
these  patterns  the  hyperbola  separating  the  positive  from  the  negative  translational  vectors 
becomes 

h(A,B,0,xo,yo‘,x,y)  :  By0x2  +  Ax0y2  ~(Ay0  + Bxo)xy  -  Af2x  -  Bf2y  +  f2Ax0+ Bf2y0  =  0 

.  (U) 

Since  within  the  field  of  view  _f2  is  much  larger  than  the  quadratic  terms  in  the  image 
coordinates  (x2,j/2,  xy),  (11)  can  be  approximated  by  h  as 

h(A,BiO,xQ,y0-,x,y) :  f2(-Ax  -  By  +  Ax0  +  By0)  =0 
or  y  =  f  x0  +  y0- 

which  describes  a  line  with  slope  ^  and  intercept  j|xo  +  yo- 

If  x0  =  0  the  intercept  is  y0 ■  Again,  one  of  these  patterns,  which  allows  to  directly  derive 
x0,  is  of  particular  interest  to  us.  This  is  the  pattern  corresponding  to  the  axis  (1,0,0).  We 
call  this  pattern  the  a-pattern  and  the  corresponding  vectors  the  a-vectors,  since  they  do 
not  contain  any  rotation  around  the  x-axis  (denoted  in  the  equations  by  a). 

h(l,0,Q,xo,yo]x,y) :  x0y2  -  y0xy  -  x/2  +  x0/2  =0 
which  simplifies  to  x  =  xo 

The  a-vectors  do  not  contain  any  rotation  around  the  x-axis  and  the  components  due  to 
rotation  around  the  t/-axis  are  nearly  constant.  As  in  the  previous  case  we  can  approximate 
the  a-pattern  by  its  translational  component,  which  is  of  a  particular  simple  form.  And 
again,  if  we  know  that  rotation  around  the  x-axis  is  small,  to  obtain  more  data  we  can 
employ  many  (A,  B ,  0)  coaxis  patterns  in  the  estimation  of  the  FOE. 


12 


3.3  Servo  system 


For  the  purpose  of  our  analysis,  we  will  first  consider  our  servomechanism  to  be  a  continuous- 
control  linear  system.  The  input  signals  encountered  in  the  present  application  are  step 
functions.  Thus,  a  natural  way  to  approach  the  design  of  our  servomechanism  is  to  con¬ 
sider  the  performance  of  the  system  in  response  to  a  step  function  input.  Indeed,  if  our 
servomechanism  is  truly  linear,  its  performance  characteristics  are  completely  summarized 
by  its  response  to  a  step-function  input. 

Typical  ways  in  which  such  an  input  is  used  include:  proportional  control,  derivative 
control,  integral  control,  proportional  plus  derivative  control,  and  so  on.  The  effect  of  the 
proportional  gain,  Kp,  is  to  drive  the  system  at  higher  velocities  when  the  error  is  larger.  The 
derivative  gain,  Kd ,  helps  accelerate  the  response  when  it  is  falling  behind  and  decelerates 
the  response  when  it  is  overtaking  the  stimulus,  which  can  speed  up  the  response.  However, 
a  higher  derivative  gain  produces  an  oscillatory  response.  The  integral  gain  K{,  on  the 
other  hand,  compensates  for  delays  and  disturbances.  In  the  following  case  study,  only  a 
proportional  controller  is  considered,  though  in  practice  a  PID  controller  could  be  used  to 
achieve  better  performance. 

To  set  up  the  control  loop  equation,  we  must  relate  the  robot’s  motion  to  the  image 
motion.  Referring  to  the  coordinate  systems  defined  in  Figure  1,  we  denote  the  velocity  of 
the  robot’s  forward  translation  by  S  and  the  velocity  of  its  rotation  around  the  y-axis  by  (3. 
6X  and  9y  are  the  pan  and  tilt  of  the  orientation  of  the  camera  with  respect  to  the  coordinate 
frame  of  the  robot. 

Using  the  subscripts  R  and  C  to  denote  the  robot  and  the  camera  respectively,  we  can 
express  the  motion  of  the  robot  in  the  camera’s  coordinate  frame  as  follows.  First  of  all,  let 
P  be  the  position  vector  that  relates  the  origins  of  the  two  coordinate  frames.  The  rotation 
matrix  cRr  that  relates  the  orientations  of  the  frames  is  of  the  following  form: 

(cos  0y  0  sin  0y  ^ 

—  sin  0y  sin  0X  cos  9X  cos  $y  sin  0X 
-  sin  6y  cos  0X  -sin  6X  cos  6y  cos  9X 

Using  T  to  denote  translation,  and  lo  to  denote  rotation,  we  can  then  express  the 
motions  of  the  robot  and  that  of  the  camera  in  their  respective  coordinate  frames  as  follows: 

TR  =  (0,0,  .S')3, 
uTr  =  (0,/?,0f 

Tc  =  c Rr(Tr  +  lor  x  P) 

=  ( S  sin 9y,S  cos  9y  sin 9X,  S  cos9ycos9x)T 
t joc  =  CRr  oTr 

=  (0,/3  cos 9x,—fi  sin 9X)T 

Thus  the  coordinates  of  the  FOE  (x0,  y0)  that  we  computed  for  the  camera  motion  are 
related  to  pan  and  tilt  as  follows: 

tan  0V  . 

xo  = - j-f 

cos  9X 

y0  =  tan  0X  f 
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(12) 

(13) 


Our  servomechanism  should  steer  the  robot  in  such  a  way  that  9y  becomes  zero.  Re¬ 
ferring  to  (13),  it  is  noted  that  if  the  value  of  9X  is  known,  we  can  compute  the  value  of  0y, 
thereby  allowing  the  servo  system  to  directly  regulate  it  to  zero.  However,  we  do  not  have 
to  assume  knowledge  of  6X.  It  can  be  seen  that  regulating  x0  to  zero  would  accomplish  the 
same  goal. 

The  position  of  x0  is  used  as  the  input  to  the  servo  system  to  control  the  amount 
of  steering  the  robot  has  to  perform.  If  the  servo  system  is  operated  with  a  proportional 
controller,  the  rotational  speed,  /?,  of  the  robot  will  be  given  by 

/?  =  Kx  o 
d6 

Writing  0  as  — and  substituting  (13)  for  x0,  we  obtain 
at 

_  tan  6y  ^4^ 

dt  cos  9X 

It  is  instructive  to  perform  some  simplifications  so  as  to  allow  us  to  analyze  how  the 
aforementioned  servo  system  responds  to  a  step  input.  Since  9X ,  the  tilt,  is  usually  very 
small,  possibly  slowly  time-varying  due  to  the  camera’s  fixation  on  the  feature  p,  we  treat 
the  denominator  term  cos  9X  as  a  constant  whose  value  is  near  1.  Furthermore,  for  the  FOE 
to  lie  inside  the  image,  6y  must  be  smaller  than  half  the  field  of  view  of  the  camera.  Thus 
even  for  a  camera  with  a  wide  field  of  view  (on  the  order  of  70°)  tan  9y  is  well  approximated 
by  the  linear  term,  9y.  This  approximation  will  become  better  as  we  approach  the  straight 
ahead  direction,  since  9y  tends  to  zero.  With  these  two  simplifications,  we  have  x0  =  Kv9y, 
where  Kv  is  given  by  cJsf-  The  overall  servo  system  can  be  represented  schematically  as  in 

Figure  8. 


Figure  8:  Servo  control  system. 

We  are  now  in  a  position  to  study  various  properties  of  the  servo  system.  Referring  to 
Figure  8,  let  the  amplifier  gain  be  Ka]  then  its  voltage  output  V  is  given  by 


V  =  I<ae  =  Ka9y 
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(15) 


If  the  motor  has  no  time  lag  and  a  speed  at  all  times  proportional  to  V  2  then: 

0  =  ^  =  KmV  (16) 

where  Km  represents  the  motor  constant.  Combining  equations  (15)  and  (16),  we  obtain  the 
differential  equation  as 

5  =  K6>  <17> 

where  I<  is  KaKm.  The  solution  of  (17)  is  given  by  0  =  9°ye~Kt,  where  6°y  denotes  the  initial 
value  of  the  pan. 


3.4  Intermittent  data 


The  servo  considered  thus  far  operates  on  the  basis  of  error  data  supplied  continuously.  It 
is  stable  if  the  poles  all  lie  in  the  half-plane  to  the  left  of  the  imaginary  axis.  However,  in 
our  actual  system,  the  servo  is  actuated  by  error  data  supplied  intermittently,  at  discrete 
moments  equally  spaced  in  time.  The  servo  receives  no  information  whatsoever  about  the 
error  during  the  period  between  two  consecutive  pulses.  We  consider  the  case  where  the 
corrective  action  of  the  servo  consists  in  continually  exerting  a  torque  on  the  output  shaft  in 
such  a  fashion  that  the  torque  is  always  proportional  to  the  error  found  at  the  immediately 
preceding  measurement.  The  torque  then  remains  constant  during  the  interval  between  two 
measurements,  changing  stepwise  at  each  measurement.  It  is  clear  that  over  correct  ion  and 
instability  will  occur  when  the  corrective  torque  per  unit  error  is  too  large. 

We  can  perform  a  standard  analysis  to  convert  a  continuous  system  to  a  sampled-data 
one.  Especially  simple  is  the  case  when  the  continuous  servo  is  controlled  by  (17).  The 
transfer  function  y(z)  of  the  sampled-data  version  is  given  by 


y(*) 


K 

z  +  K 


In  our  present  considerations,  the  unit  circle  plays  the  role  that  the  half-plane  to  the 
left  of  the  imaginary  axis  plays  in  the  theory  of  continuous  error  input.  Thus,  the  root  of 
the  denominator  of  y(z)  must  be  numerically  less  than  1.  In  other  words,  \K\  must  be  less 
than  1. 


4  Task  Two:  Pursuing  a  Moving  Target 


The  task  discussed  in  the  preceding  section  is  limited  to  the  case  where  the  input  is  a  step 
function,  i.e. ,  the  desired  heading  direction  is  fixed.  We  next  consider  the  case  where  the 


2Actually,  all  motors  have  time  constants,  that  is,  exhibit  inertial  effects,  and  the  differential  equation 
must  be  modified  by  adding  a  term: 

<P  6y  d6y 

lm  dt 2  +  dt  ~  m 
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servo  system  is  called  upon  to  follow  a  time- varying  input.  We  encounter  such,  a  situation 
when  the  system  is  involved  in  the  pursuit  of  a  moving  target.  In  addition,  we  impose 
a  second  constraint  on  the  control  system:  while  tracking  a  target,  we  want  the  system  to 
maintain  a  constant  distance  away  from  the  target.  Such  a  task  requires  more  complex  visual 
information  processing.  In  particular,  we  compute  from  the  images  the  time  to  collision,  or 
time  to  contact  [25],  from  the  observer  to  the  target,  and  we  use  it  in  the  control  of  the 
robot. 

We  will  now  discuss  the  estimation  of  time  to  contact  from  images;  we  will  also  discuss 
related  work  and  describe  the  technique  used  here.  Let  us  note  here  that  traditional  methods 
by  which  this  is  accomplished  cannot  be  used  in  our  task.  We  first  discuss  the  simple  case 
of  tracking  a  time- varying  input  via  a  PID  controller.  Then,  an  additional  controller  that 
makes  use  of  the  time  to  contact  information  is  incorporated  to  meet  our  more  elaborate 
criteria. 

4.1  Estimating  time  to  contact  when  the  observer  and  the  target  are  moving 

The  time  to  contact  between  the  camera  and  a  scene  point  is  defined  as  the  value  with  Z 
being  the  depth  and  W  the  relative  forward  translational  speed  of  the  camera  with  respect 
to  the  point.  If  the  scene  point  lies  on  the  moving  target,  then  assuming  constant  relative 
motion  over  time,  ^  expresses  the  time  left  until  the  target  will  hit  the  infinitely  large 
image  plane.  However,  computing  the  time  to  contact  of  a  point  on  the  target  is  greatly 
complicated  by  the  difficulty  of  computing  the  target’s  relative  motion  if  the  image  of  the 
target  covers  only  a  small  part  of  the  image  plane.  In  the  literature  a  number  of  methods 
have  been  proposed  which  are  based  on  the  utilization  of  divergence  [10,  28,  32,  36].  However, 
divergence  involves  the  computation  of  the  spatial  derivatives  of  flow  and  thus  is  very  hard 
to  compute.  Also,  divergence  is  proportional  to  two  components,  which  cannot  be  separated 
without  additional  knowledge  about  the  motion  and  the  shape  of  the  target.  These  are  the 
time  to  contact  and  the  product  of  the  fronto-parallel  translational  velocity  times  a  function 
of  slope  and  tilt  of  the  local  plane  in  view.  Another  approach  that  has  been  proposed  is 
based  on  the  utilization  of  constraints  from  fixation  and  tracking  [17]  The  change  of  rotation 
in  order  to  track  the  target  accurately  is  related  to  the  target’s  change  in  depth  which  in 
turn  is  related  to  the  time  to  contact.  The  problem  with  this  approach  is  that  it  requires 
the  robot  to  keep  track  of  its  own  exact  motion  during  tracking;  this  is  a  difficult  task  if  the 
robot  is  considered  not  to  be  static,  but  moving  itself. 

The  way  to  circumvent  these  problems  is  to  use  visual  information  not  from  the  area 
covered  by  the  target,  but  from  the  area  surrounding  the  target.  In  particular,  if  the  target 
is  moving  on  the  ground  we  can  utilize  the  area  at  the  bottom  of  the  target,  which  usually 
is  at  about  the  same  distance  as  the  target.  From  this  we  can  obtain  a  measure  jy,  where 
W  is  the  forward  translational  velocity  of  the  robot  with  respect  to  the  static  scene  and  Z 
is  the  depth  of  points  close  to  the  target. 

Using  the  patterns  described  in  Sections  3.1  and  3.2  we  can  estimate  the  motion  of 
the  observer.  In  particular,  we  can  utilize  the  coaxis  vectors  parallel  to  the  x-axis  or  the 
a-vectors  very  efficiently,  even  if  there  is  additional  rotational  motion  around  the  <t- axi s .  In 
general,  knowing  the  3D  motion  and  knowing  the  normal  flow  at  a  point  allows  us  to  derive 
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(18) 


^  at  the  corresponding  point  from  (3)  as 

Z  _  (x  -  xQ)nx  +  (y  —  yo)ny 
IT  Un  (uTOtnx  T  ^rot^y) 

For  the  copoint  vectors  with  gradient  (1,0)  and  the  o:-vectors,  the  value  of  the  flow  along 
the  gradient  can  be  approximated  as 

W 

Un  —  ~^"{x  Xq) 


and  thus 

Z  x  —  Xq 

it  =  Vn 

Since  we  have  estimated  xo  we  can  estimate  the  time  to  contact  from  this  relationship  by 
measuring  the  normal  flow  at  a  number  of  scene  points  and  solving  an  overdetermined  system 
of  linear  equations  using  a  least  square  minimization. 

4.2  Simple  tracking 

By  simple  tracking,  we  mean  that  the  pursuer  is  not  concerned  with  its  distance  from  the 
target.  Several  additional  considerations  arise  since  the  input  is  now  time  varying.  Foremost 
among  these  is  that  the  rise  time  and  the  damping  should  be  improved.  Furthermore,  the 
type  of  the  system  should  be  increased  so  as  to  eliminate  steady  state  error  in  response  to  a 
ramp  input.  Thus  a  PID  controller  is  needed,  where  the  PI  portion  is  used  to  improve  the 
steady-state  error  of  the  system,  and  the  PD  portion  is  used  to  meet  the  damping  require¬ 
ment.  More  generally,  one  can  use  a  lead-lag  controller  to  achieve  the  desired  compensation. 

Other  techniques  might  be  used  to  speed  up  the  response  of  the  system.  Gain  scheduling 
is  one  such  method.  Basically,  gain  scheduling  approximates  a  nonlinear  function  with  several 
linear  pieces,  each  of  which  admits  a  linear  control  design.  A  typical  example  of  such  attempts 
is  found  in  the  design  of  an  aircraft  autopilot,  where  an  airplane  system  model  is  linearized 
around  several  hundred  operating  points  selected  within  the  plane’s  operating  region.  In  our 
case,  the  adjustable  fixed  gains  are  selected  as  a  function  of  the  FOE  position.  When  the 
FOE  is  out  of  the  image,  a  relatively  high  torque  is  applied  to  quickly  bring  the  FOE  within 
the  field  of  view  of  the  camera.  As  the  system  approaches  the  straight-ahead  direction,  the 
torque  is  “stepped  down”  according  to  a  gain  schedule.  This  improves  the  transient  speed 
of  response  without  bringing  in  steady-state  oscillations. 

There  are  various  ways  to  improve  the  control.  The  most  obvious  is  that,  in  order  to 
track  a  maneuvering  target  well,  we  might  use  a  more  sophisticated  tracking  method  for  the 
camera  that  builds  in  some  predictive  capability.  For  instance,  in  Birmiwal  et  al.  [7],  two 
state  models  of  the  target  were  used:  a  low-order  or  nonmaneuvering  model  and  a  high-order 
or  maneuvering  model.  When  a  maneuver  was  detected,  a  switch  from  the  low-order  to  the 
high-order  model  was  accomplished  by  adding  extra  state  components.  The  tracking  was 
then  done  with  the  augmented  state  model  until  reversion  to  the  normal  model  took  place 
as  a  result  of  another  decision.  The  change  in  dimension  of  the  filter  allows  acceleration  to 
be  modeled. 
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4.3  More  elaborate  pursuit 

We  next  consider  the  case  when  the  system  not  only  has  to  pursue  a  moving  target,  but  also 
has  to  keep  a  constant  distance  from  it.  To  give  an  example  from  nature  where  such  a  task  is 
required,  consider  the  following  situation:  A  male  hoverfly  Syritta  shadows  a  potential  mate 
until  the  latter  lands  on  a  flower.  Only  then  does  it  attempt  capture  by  accelerating  with 
a  constant  force  towards  its  target.  The  male  while  shadowing  keeps  a  constant  distance, 
about  10  cm,  from  its  quarry  [11].  Thus  the  strategy  might  consist  of  an  initial  phase  during 
which  the  target  is  pursued  with  full  speed.  When  the  target  is  within  a  certain  range,  a 
switch  in  the  control  scheme  is  used  to  maintain  the  desired  distance. 

Such  (and  similar)  problems  in  which  additional  constraints  are  imposed  on  the  system 
typically  lead  to  optimal  solutions  with  highly  complex  switching  control  schemes,  the  solu¬ 
tion  of  which  is  beyond  the  scope  of  our  paper.  Here  we  present  a  scheme  that,  given  the 
information  on  time  to  contact,  regulates  its  speed  by  a  simple  proportional  controller. 

Having  estimated  the  time  to  contact  as  described  before,  we  use  this  information  to 
drive  a  separate  servo  loop  that  regulates  the  forward  speed  of  the  observer.  To  intercept 
the  target  as  fast  as  possible,  we  set  the  desired-  time  to  contact  to  zero,  and  then  use  the 
difference  between  the  desired  and  the  computed  time  to  contact  to  drive  the  servo  loop. 
A  desired  time  to  contact  with  nonzero  value  will  result  in  a  kind  of  stalking  behavior. 
Section  6,  which  describes  our  experiments,  will  present  results  on  both  these  scenarios. 

5  Task  Three:  Perimeter  Following 
5.1  Estimating  functions  of  depth 

More  complex  than  the  understanding  of  rigid  motion  is  the  understanding  of  depth  and 
shape.  Our  viewpoint  is  that  it  is  not  necessary  to  compute  exact  depth  measurements  which 
are  very  hard  to  derive  and  whose  computation  requires  exact  knowledge  about  the  motion 
(or  stereo)  configuration.  Instead  we  could  aim  at  computing  less  informative  descriptions  of 
shape  and  depth,  such  as  functions  of  depth  and  shape  where  the  functions  are  such  that  they 
can  be  computed  easily  from  well-defined  image  information.  These  ideas  are  demonstrated 
here  by  means  of  the  task  of  perimeter  or  wall  following. 

Perimeter  following  in  our  application  is  described  as  follows:  A  robot  (car)  is  moving 
on  a  road  which  is  bounded  on  one  side  by  a  wall-like  perimeter.  On  the  basis  of  visual  infor¬ 
mation  the  robot  has  to  control  its  steering  in  order  to  keep  its  distance  from  the  perimeter 
at  a  constant  value  and  maintain  its  forward  direction  as  nearly  parallel  to  the  perimeter  as 
possible.  The  perimeter  is  defined  as  a  planar  textured  structure  in  the  scene  perpendicular 
to  the  plane  of  the  road.  This  definition  includes  any  planar  structure  (connected  or  not) 
that  can  be  found  at  the  boundary  of  a  road  or  path,  such  as  walls,  houses  parallel  to  the 
road,  or  a  line  of  trees. 

This  task  requires  us  to  compute  from  the  spatio-temporal  images  some  form  of  infor¬ 
mation  about  the  distance  between  the  perimeter  and  the  robot.  The  most  common  way 
to  address  this  problem  is  to  perform  reconstruction  in  order  obtain  exact  depth.  Another 
approach  requires  us  to  compute  the  slopes  of  lines  parallel  to  the  road  (boundary  lines  on 
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highways).  This  means  that  the  boundary  first  has  to  be  detected  and  thus  the  segmenta¬ 
tion  problem  has  to  be  solved.  Unless  the  boundaries  have  image  features,  which  are  clearly 
distinguishable  from  the  rest  of  the  image,  this  is  a  very  difficult  task.  In  the  approach 
described  here  we  compute  some  form  of  “qualitative”  depth  information  which  is  sufficient 
for  deducing  the  steering  motion  and  which  we  can  derive  without  first  having  to  compute 
the  motion  parameters  and  without  having  to  detect  boundary  lines. 

Our  strategy  applied  to  the  perimeter  following  task  is  as  follows.  While  the  robot  is 
moving  forward  it  has  its  camera  directed  at  some  point  on  the  perimeter.  As  it  continues 
moving  it  maintains  the  relative  orientation  of  the  camera  with  regard  to  its  forward  trans¬ 
lation.  It  compares  distance  information  derived  from  flow  fields  obtained  during  its  motion 
with  distance  information  computed  from  a  flow  field  obtained  when  it  was  moving  parallel 
to  the  road.  This  distance  information  will  tell  what  the  robot’s  steering  direction  is  with 
respect  to  the  perimeter. 

The  distance  information  we  use  is  the  scaled  directional  derivative  of  inverse  depth 
along  (imaginary)  lines  on  the  perimeter.  From  the  observed  flow  field,  normal  flow  mea¬ 
surements  along  (imaginary)  lines  through  the  image  center  are  selected  and  compared  to 
normal  flow  measurements  along  (imaginary)  lines  of  equal  slope  in  the  reference  flow  field. 
This  information  is  derived  from  the  image  of  the  whole  perimeter,  and  thus  is  global.  Fur¬ 
thermore,  as  in  the  previous  tasks,  we  do  not  need  to  compute  correspondence  or  optical 
flow,  but  only  the  normal  flow  components. 


5.2  Direct  visual  depth  cue 


As  the  robot  is  moving  along  its  path  the  motion  parameters  perceived  in  the  images  change. 
For  comparison  reasons,  we  assume  that  the  angle  0y  and  the  angle  6X  between  the  forward 
direction  and  the  camera  direction  (determining  x0  and  y0)  remain  constant.  The  robot’s 
rotational  velocity  (around  the  x-axis  and  y- axis)  can  change  in  any  way.  The  technique, 
however,  is  independent  of  these  parameters.  We  next  investigate  the  motion  fields  perceived 
during  motion  and  how  depth  is  encoded  in  the  flow  values. 

The  motion  perceived  in  the  images  is  due  to  a  translation  (U,  V,  W )  and  a  rotation 
(a,  fi).  Thus  from  (3),  if  we  divide  un  by  nx  (if  nx  /  0),  we  obtain  a  function  /„(x,n)  =  ^ 
of  the  image  coordinates  x  =  (x,  y)  and  the  normal  direction  n  =  (nx,  ny): 


/.(*,»)  =  St  =  +(^/)i)-«4+/)+^).  (is) 

~  L  L  XI  x  J  J  Tlx  J  J  fix 


nx 


Let  us  choose  directions  (nx,ny)  such  that  x  +  =  A  with  A  being  some  constant. 

These  directions  are  the  ( A ,  0)  copoint  vectors,  i.e.  the  vectors  perpendicular  to  lines  passing 
through  the  point  (.A,  0).  This  can  be  easily  seen  by  considering  (nx,ny)  as  the  tangents  to 
a  family  of  curves  and  solving  the  differential  equation  for  y(x): 


ny 


nx 


y\x) 


A  —  x 

y(*) 


which  has  as  its  solution 

y2  +  (x  —  K)2  —  C 
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(20) 


For  these  directions  (19)  becomes 


(-U-V^  +  KW)  ,ny  Ky  ott  Kx, 
/„(x,n)  = - - f  a(/~  +  ~f ^  ~  ^  +  ~f~^ 


In  the  perimeter  following  task,  we  consider  the  normal  flow  along  imaginary  lines  passing 
through  the  image  center;  thus  K  =  0  and  we  obtain 


f  (  \  Un  (-U~V^  I  Rf 

/”(x'n)  =  Z  = - z - +  af T'  -  N 


(21) 


Along  each  of  these  lines  ^  is  constant  and  thus  ( —U  —  V^)  and  —  /?/)  are  also 

constant  and  (21)  describes  /n(x,  n)  as  a  function  which  is  linear  in  the  inverse  depth. 
For  any  two  points  Pi  and  P2  with  coordinates  Xi  and  x2  along  such  a  line  the  difference 
(/n(xi,n)  -  /n(x2,  n))  is  independent  of  the  rotation.  We  thus  compute  the  directional 
derivative  D(/n(x,  n))nx  of  /„(x,n)  at  points  on  lines  with  slope  k  =  in  the  direction 
of  a  unit  vector  n1  =  {-ny,  nx)  parallel  to  the  image  lines.  Dropping,  in  the  notation  of  the 
directional  derivative,  the  dependence  of  fn  on  x  and  n  we  obtain 

W.V  =  (-U  -  V^)fl(i)„*  (22) 

Thg»  /j 


Figure  9:  Geometric  configuration  during  perimeter  following. 

We  next  derive  D(  j)n±-  Referring  to  Figure  9,  the  camera  is  mounted  on  the  robot 
and  the  robot  is  moving  forward  with  velocity  S.  The  camera  is  directed  toward  a  point  F 
on  the  perimeter.  We  fix  a  coordinate  system  XY Z  to  the  robot,  such  that  the  Z  axis  is 
aligned  with  the  optical  axis  of  the  camera  and  the  XY  plane  is  perpendicular  to  it.  Let 
m  be  the  line  in  the  image  parallel  to  n1  along  which  we  compute  depth,  and  let  M  be  the 
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corresponding  line  in  3D  on  the  perimeter.  Let  Z0  be  the  depth  at  the  fixation  point  and  $ 
the  angle  between  the  Z-axis  and  M.  The  depth  Zp  of  any  point  P  on  M  is 


Zp 


z0  + 


JLp 

tan  $ 


where  Lp  is  the  value  of  the  parallel  projection  of  FP  on  the  XY-plane.  LP  projects  per- 
spectively  onto  lp  (along  n1)  in  the  image  plane,  where  lP  =  thus,  dropping  subscript 
P,  for  any  depth  value  Z  we  obtain 


1 _ i — 

/  tan$ 


For  points  with  coordinates  In 1  and  gradient  direction  n  we  obtain 

n)  =  (-U-  +  «(/^  +  ^)  -  W  +  Y-'. 

Ux  1  /  tan$  x  J  J 

and  (22)  for  any  point  along  the  line  m  becomes 


d(Su)^  =  (u+ vy ) 


Tly  v  1 


nx  tan  §Zof 


(23) 


In  the  remainder  of  this  section  we  demonstrate  the  dependence  of  Z)(/„)n±  on  the 
robot’s  steering  direction.  In  particular,  we  show  that  |D(/n)ni|  along  certain  directions 
n1  decreases  as  the  robot  steers  towards  the  perimeter,  or  that  for  any  two  flow  fields 
corresponding  to  motion  configurations  C\  and  Co  depending  on  parameters  ($i ,  Zo, )  and 
($2,Zo2),  \D(fn)nx |j  >  |(D(/n)ni|2  if  |$i|  <  |$2|-  Here,  using  the  absolute  value  allows  to 
provide  a  general  notation  that  describes  fixations  of  the  robot  to  its  left  and  to  its  right. 


5.3  Comparing  ordinal  depth  information 

We  describe  vectors  with  respect  to  three  orthogonal  coordinate  systems  XY Z,  X'Y'Z'  and 
X"Y"Z"  that  are  being  rotated  into  each  other  (see  Figure  10).  The  orientations  of  these 
coordinate  systems  are  such  that  the  /Taxis  is  parallel  to  the  direction  of  translation  when 
the  robot  is  moving  parallel  to  the  perimeter,  the  Z'-axis  is  parallel  to  the  robot’s  viewing 
direction  in  configuration  Ci,  and  the  Z"- axis  is  parallel  to  the  robot’s  viewing  direction  in 
configuration  CT  The  orientations  of  the  frame  X'Y'Z'  and  the  frame  X"Y"Z"  are  related 
to  the  orientation  of  the  frame  XYZ  through  rotation  matrices  R'  and  R",  as  described 
below,  which  are  dependent  on  parameters  <j>'x,  (f>'y  and  <f>'x,  <j>y  (rotation  around  the  x-axis  is 
the  same),  where  \4>"\  >  \<f>'y\. 
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Figure  10:  Three  reference  coordinate  systems. 
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Any  vector  V  in  XY Z  corresponds  to  V'  in  X'Y' Z'  and  V"  in  X"Y"Z",  where 
V'  =  R'V  V"  =  R"V  and  V  =  R'TV'  V  =  R"TV" 


with  R'TV'  being  the  transpose  of  R'  and  R"T  being  the  transpose  of  R”. 

Let  (in  Ci)  m'  be  the  line  in  the  image  with  slope  k  =  along  which  measurements 
are  taken,  which  is  described  by  the  following  line  equation: 

m' :  y'  =  kx' 


In  order  to  obtain  a  vector  in  3D  on  the  corresponding  line  M'  on  the  perimeter,  we  intersect 
the  plane  Y'  =  kX'  with  the  plane  of  the  perimeter,  which  is  at  a  distance  d  from  the 
robot  and  thus  is  described  through  equation  X  =  d  in  the  coordinate  system  XY Z  or 
r'uX'  +  r'21Y'  +  r’31Z'  —  d  in  the  coordinate  system  X'Y'Z' .  Thus  a  unit  vector  a  along  M' 
in  X'Y'Z'  is  computed  as 

_ 1 _ 

+ p + (i±tk)2 

and  a  unit  vector  b  along  the  Z'-axis  in  X'Y'Z'  is 


rH  KT2Y 


0 

0 

1 
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The  cosine  of  the  angle  $1  between  a  and  b  is  thus 


cos  $1  =  a  •  b  = 


11 


+  M 


21 


r31 


i  +  jb2  +  (iii+*rk) 

\  r31  / 


and 


tan  = 


rWP  -f  1 


r'i  i  +  Mi 

Let  d  be  the  distance  from  the  camera  to  the  perimeter,  d  is  measured  along  the  X-axis  in 
XY Z.  Any  vector  V  in  XY Z ,  which  is  parallel  to  the  Z'-axis,  is  described  as  A(rgX ,  r'32,  r33)T, 
with  A  being  a  scalar.  Thus  for  a  vector  V  of  length  Zo1 ,  d  =  \r'3l  and  the  value  of  Zo1 
amounts  to 


and  thus 


1 


'31 


11 


+  kr' 
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7 _  (24) 

tan$i  Zoj  d\fk 2  +  1 

Assuming  that  the  time  between  measurements  is  small,  and  thus  the  horizontal  distance  d 
in  C2  is  the  same  as  in  Ci,  we  obtain  for  C 2  (see  Figure  11) 


11 


+  kr" 


21 


tan$2Zo2 


dy/W+1 


(25) 


Figure  11:  Comparing  the  values  from  two  configurations. 
Comparing  (24)  and  (25),  we  are  thus  interested  in  values  k  for  which 

Ki  +  Mil  >  Ki  +  Mil 
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(26) 


or 

|  cos  4>'y  —  k  sin  <f>'x  sin  <j>y  |  >  |  cos  <f>'  -  k  sin  <j)'x  sin  <f>y\ 

This  inequality  is  true  for  all  (k  sin  <f>'x  sin  <f>'y  >  0),  assuming  that  \<f>'y\  <  45°  and  \<j>"\  <  45°. 
(This  assumption  is  used  only  to  allow  for  a  general  notation  and  is  not  needed  if  different 
sign  cases  are  listed  separately.)  On  the  basis  of  this  result  the  input  to  the  control  system 
for  deriving  the  steering  direction  can  be  generated.  At  any  point  in  time,  D(fn)n±.  for 
directions  such  that  (-^sin  <f>'x  sin  <f>'y  >  0)  are  computed  from  the  visual  flow  field  and 
compared  to  prestored  values  D{fn)nj.  due  to  a  forward  motion  parallel  to  the  perimeter; 
the  computed  sign  of  the  change  in  |Z)(/n)nx|  defines  the  sign  of  the  change  in  the  steering 
angle. 

In  the  analysis  above,  the  assumption  was  made  that  the  horizontal  distance  d  does  not 
change.  When  this  assumption  does  not  hold,  the  same  principle  can  still  be  applied  to  a 
smaller  number  of  values  if  enough  image  gradients  are  available.  If  the  distance  d  decreased 
between  Ci  and  C2,  then  \r'n  +  kr'21  |  may  be  smaller  than  | r"x  +  kr^  |  for  values  of  |&|  smaller 
than  some  threshold  T,  but  must  be  larger  for  values  of  |fc|  greater  T ,  and  thus  if  gradients 
on  a  line  with  \k\  >  T  are  available,  these  measurements  can  still  be  used  for  comparison. 

Finally,  we  want  to  characterize  the  lines  for  which  (A:  sin  <f>x  sin  <f>y  >  0).  Comparing 
such  a  line  to  the  image  of  a  line  parallel  to  the  road  on  the  perimeter  passing  through  the 
image  center,  we  find  that  the  slopes  of  the  two  lines  are  of  opposite  sign.  For  example,  if  the 
camera’s  optical  axis  is  pointing  down  and  to  the  right  (as  in  Figure  9)  then  <f>'x  is  negative 
and  <f>'y  is  positive  and  the  slope  of  the  parallel  line  is  positive,  whereas  the  slope  of  the  line 
we  use  for  comparison  has  to  be  negative. 

6  Experiments 

This  section  presents  the  results  of  simulations  and  results  obtained  with  real  images  using 
the  algorithms  proposed  in  the  previous  sections. 

A.  Task  1 

The  first  experiments  used  simulations  of  the  robot’s  trajectory.  The  robot  is  initially  moving 
at  an  angle  relative  to  the  2-axis.  It  is  then  required  to  steer  itself  so  that  it  is  heading  along 
the  2-axis,  where  the  feature  p  is  located  at  a  distance  of  5  m  away.  The  robot  is  moving 
with  a  constant  forward  speed  of  1.5  m/s. 

The  mobile  platform  is  a  conventionally  steered  vehicle;  the  instantaneous  radius  of 
curvature  r  of  its  trajectory  is  related  to  the  steering  angle  of  its  wheel  Or  as  follows. 

_  0.51 
sin  Or 

where  L  is  the  body  length  of  the  vehicle.  The  field  of  view  of  the  camera  used  in  the 
simulation  is  50°.  We  created  the  synthetic  normal  flow  field  from  a  scene  with  random 
depth  and  we  added  zero-mean  Gaussian  noise  with  a  standard  deviation  of  1  pixel  to  the 
normal  flow  measurements.  For  the  estimation  of  xq,  we  employed  the  a- vectors,  where  (as 
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described  in  Section  3.2)  we  approximated  the  a-hyperbola  by  a  straight  line,  which  was 
estimated  using  a  linear  classifier. 

Figure  12  shows  the  trajectories  for  two  different  values  of  the  proportionality  constant 
I{  generated  by  the  servo  system.  The  two  curves  correspond  to  values  of  K  =  0.1  and 
K  =  0.3.  As  can  be  seen,  the  system  has  a  poor  rise  time  and  insufficient  damping — typical 
of  a  proportional  controller. 


p 


Figure  12:  Trajectory  generated  by  the  servo  system. 


Figure  13:  3D  configuration  as  studied  in  Task  1. 

In  our  experiments  with  real  images,  the  mobile  platform  on  which  the  camera  was 
mounted  was  a  conventionally  steered  vehicle.  The  camera  had  a  focal  length  of  1136  pixels, 
and  the  image  dimensions  were  at  720  x  576;  thus  the  field  of  view  was  approximately 
30°.  Considering  that  the  steering  movements  must  be  small  so  as  to  reduce  inter-frame 
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disparity,  and  allowing  for  the  computation  of  image  flow,. especially  since  the  focal  length 
of  the  camera  was  very  large,  we  chose  to  operate  the  servo  system  with  a  proportionality 
constant  of  K  =  0.1.  Figure  14  shows  some  images  taken  by  a  camera  mounted  on  a  mobile 
platform,  as  the  latter  is  making  steering  movements.  The  feature  p  corresponds  to  the  star 
in  the  center  of  the  image,  mounted  on  a  tripod  and  initially  located  at  a  distance  of  5  m 
from  the  camera.  Figure  13  displays  the  configuration  of  this  setting  including  the  robot  s 
trajectory.  Figure  14a,  c,  and  e  show  images  taken  by  the  system  at  three  time  instants  (as 
marked  in  Figure  13)  with  the  normal  flow  fields  superimposed.  Figures  14b,  d,  and  f  show 
the  positive  and  negative  q- vectors  as  computed  from  the  normal  flow  fields  in  black  and 
white  and  the  line  approximating  the  cc-hyperbola  which  has  been  fitted  to  the  data. 


(e)  (f) 

Figure  14:  Task  1:  Some  scenes  along  the  trajectory. 
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B.  Task  2 


The  experiments  on  Task  2  were  based  on  simulations.  In  the  simple  type  of  pursuit,  the 
servo  operates  based  on  the  misalignment  between  the  visual  axis  and  the  axis  of  the  robot 
body.  It  is  not  concerned  with  the  distance  between  the  robot  and  the  target.  In  other 
words,  the  forward  speed  of  the  robot  is  not  regulated. 

Figure  15  depicts  the  trajectories  computed  for  the  following  two  scenarios.  In  Fig¬ 
ure  15a,  the  target  was  moving  to  the  left  with  a  speed  of  0.5  m/s;  there  was  no  motion 
along  the  .z-axis.  In  Figure  15b,  the  target  was  receding  in  depth  as  well.  The  target  was 
initially  located  10  m  away  from  the  robot.  In  both  cases  the  servo  was  operated  based  on 
a  PID  controller,  with  Kp  =  0.3.  Kj  =  0.03,  and  I\p  =  0.1,  where  Kp,Ki ,  and  Kp  are 
the  proportionality  constant,  the  integral  constant,  and  the  derivative  constant,  respectively. 
The  trajectory  of  the  target  is  shown  as  a  solid  line,  and  that  of  the  robot  is  shown  as  a 
dotted  curve.  As  can  be  seen,  compared  to  Figure  12,  the  system  now  has  much  better 
transient  characteristics,  due  to  its  PID  control. 


Figure  15:  Simple  pursuit  of  a  moving  target:  (a)  Target  moving  along  the  rc-axis.  (b)  Target 
moving  off  at  an  angle.  The  dotted  path  represents  the  trajectory  of  the  robot;  the  solid  line 
represents  that  of  the  target. 

For  the  purpose  of  target  interception,  in  addition  to  the  above  control,  forward  speed 
regulation  is  required.  Figure  16a  shows  the  case  where  the  speed  is  regulated  so  as  to 
minimize  the  distance  between  the  target  and  the  pursuer  as  much  as  possible,  subject  to 
the  speed  limit  of  the  robot.  It  can  be  seen  that  the  target  was  captured  within  a  much 
shorter  time.  Figure  16b  shows  the  case  where  the  criterion  is  to  maintain  a  constant  time 
to  contact.  As  described  in  Section  4.3  the  control  was  operated  by  setting  the  desired  time 
to  contact  to  zero  and  using  the  difference  between  the  desired  and  the  computed  time  to 
contact  to  drive  the  servo  loop.  The  study  shows  that  as  the  robot  approaches  the  target,  it 
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slows  down  accordingly.  Thus  a  kind  of  shadowing  behavior  is  exhibited,  and  the  interception 
of  the  target  can  be  effected  in  a  much  smoother  manner. 


(a)  (b) 


Figure  16:  More  elaborate  pursuit  of  a  moving  target:  (a)  To  minimize  time  to  contact, 
(b)  To  maintain  a  constant  time  to  contact.  The  dotted  path  represents  the  trajectory  of 
the  robot;  the  solid  line  represents  that  of  the  target. 


C.  Task  3 

In  the  experiments  on  Task  3  a  mobile  platform  with  a  camera  mounted  on  it  moved  along 
an  alley-like  perimeter.  The  camera  had  a  focal  length  of  about  1000  pixels  and  the  image 
dimensions  were  512  x  512.  The  servomechanism  was  implemented  as  a  simple  proportional 
control  relating  the  robot’s  rotational  speed  around  the  y-axis  to  the  directional  derivative 
of  fn  (as  defined  in  Section  5.2).  The  scene  contained  a  highly  textured  perimeter.  We  thus 
derived  image  measurements  along  a  number  of  lines  and  used  the  mean  of  the  computed 
estimates  as  input  to  the  servo  system.  Figure  17a  shows  one  of  the  reference  images,  which 
was  taken  when  the  robot  was  moving  parallel  to  the  perimeter.  The  lines  along  which  image 
measurements  were  taken  are  overlaid  on  the  image  in  white.  Figure  17b  shows  the  normal 
flow  field  computed  for  this  same  image.  Figure  17c  and  d  display  two  more  images  taken 
while  the  robot  moved  along  its  path,  one  when  it  steered  towards  the  perimeter  and  another 
one  later  when  it  again  moved  away  from  the  perimeter. 

7  Conclusions 

A  new  way  of  making  use  of  visual  information  for  autonomous  behavior  has  been  presented. 
Visual  representations  which  are  manifested  through  geometrical  constraints  defined  on  the 
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Figure  17:  Task  3:  The  robot  moves  along  an  “alley”. 

flow  in  various  directions  and  on  the  normal  flow  were  used  as  input  to  the  servomechanism. 
Specifically,  the  constraints  described  are  global  patterns  in  the  sign  of  the  flow  in  different 
directions  whose  positions  and  forms  are  related  to  3D  motion,  and  patterns  of  normal  flow 
along  lines  in  the  image  which  encode  relative  3D  distance  information.  3D  motion  and 
structure  representations  derived  from  these  constraints  were  applied  to  the  solution  of  a 
number  of  navigational  problems  involving  the  control  of  a  system’s  3D  motion  with  respect 
to  its  environment  and  to  other  moving  objects.  Some  of  the  constraints  described,  however, 
are  of  a  general  nature,  and  thus  might  be  utilized  in  various  modified  forms  for  the  solution 
of  other  navigational  problems. 
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